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Resume: Dans cette presentation, nous introduisons une nouvelle methode de detection de ruptures 
sur l'indice de Hurst, pour un mouvement brownien fractionnaire par morceaux. En premier lieu, nous 
definissons le modele et le problems statistique. La methode proposee est une transposition de la methode 
FDpV a l'estimation de l'indice de Hurst. La methode FDpV (derivee filtree avec p-value) a ete introduite 
pour detecter des ruptures sur la moyenne par Bertrand et al. (2011). La statistique sous-jacente de 
la technologie FDpV est un nouvel estimateur de l'indice de Hurst, appele statistique de Bernouilli des 
accroissements (IBS). A la fois les methodes FDpV et IBS ont une complexity lineaire par rapport a la 
taille de la serie d'observation, aussi bien en temps de calcul que pour la memoire, done egalement leur 
combinaison. 

Mots cles: Detection de ruptures, Derivee Filtree, mouvement Brownien fractionnaire par 
morceaux, parametre de Hurst, Satistique de Bernouili des accroissements. 

Abstract: In this presentation, we introduce a new method for change point analysis on the Hurst 
index for a piecewise fractional Brownian motion. We first set the model and the statistical problem. The 
proposed method is a transposition of the FDpV (Filtered Derivative with p-value) method introduced 
for the detection of change points on the mean in Bertrand et al. (2011) to the case of changes on the 
Hurst index. The underlying statistics of the FDpV technology is a new statistic estimator for Hurst 
index, so-called Increment Bernoulli Statistic (IBS). Both FDpV and IBS are methods with linear time 
and memory complexity, with respect to the size of the series. Thus the resulting method for change 
point analysis on Hurst index reaches also a linear complexity. 

Keywords: Change point analysis, Filtered derivative with p-value method, Hurst parameter, 
Increment Bernoulli Statistic, piecewise fractional Brownian motion. 



Introduction 

Recent measurement methods allow us to record and to stock large data sets, so called "the data 
deluge". For instance, today technology allows recording of heartbeat series during 24 hours 
leading to data sets of size n > 100, 000, and very high frequency (VHF) financial series leads to 
data size n > 40, 000. Tomorrow, many other series will be recorded at VHF leading to millions 
of data. 

Large or huge series with small meshes of time can be described as continuous time processes 
observed at discrete times. Such a stochastic process X belongs to a certain class of model, that 
is X £ A4 = {Xq, 9 £ ©}, where O is a subset of M. d and d is the dimension of the model. The 
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structural parameter 9 is believed to provide relevant information on the system which generates 
the series, and statisticians have to estimate it. 

A slightly different approach is based on change point analysis: The structural parameter 
9 is assumed to be piecewise constant with an unknown configuration of change r. In this 
framework, the first task of statisticians is the estimation of the location of the change points 
and the second one could be the estimation of the structural parameters between change points. 
There is a huge literature on change point analysis and model selection since the fifties, see e.g. 
Basseville and Nikiforov (1993), Brodsky and Darkhovsky (1993), Csorgo and Horvath (1997), 
Birge and Massart (2007) or Bertrand et al. (2011) and the references therein. However, most of 
the studies are devoted to change on the mean, on the variance or on the regression parameters. 
But relevant informations are also provided by the time dependence structure of the process, see 
e.g. Ayache and Bertrand (2011) and Khalfa et al. (2011). Fractional Brownian motion (fBm) is 
a paradigmatic example of such process, indeed fBm is a zero mean Gaussian process depending 
on two parameters: The Hurst index H linked to the time structure and a scale parameter a. 

In this presentation, we consider a simple model, that is a process X which is a piecewise fBm 
with an unknown configuration of changes. Moreover, we set us in the frame of huge datasets, 
and we focus our attention on time and memory complexity. These two reasons have lead us to 
propose a new change point procedure for detection of change on the Hurst index for a piecewise 
fBm. Our new procedure is the combination, on the one hand of the FDpV method, introduced 
in Bertrand et al. (2011) for fast and light detection of change on the mean, variance or regression 
parameter, and on the other hand of the Increment Bernoulli Statistic (IBS) a new estimator for 
Hurst index, which is a variation on the Increment Ratio Statistic (IRS) estimator introduced in 
Bardet and Surgailis (2010). 

The rest of this paper is organized as follows: At first, in Section [U we define our model of 
piecewise fBm. Next, in Section [2l we introduce a new fast and robust estimator of the Hurst 
index of fBm, namely the Increment Bernoulli Statistic. Then, in Section [3j we describe the 
transposition of the FDp-V method to Hurst index. 

1 Our model 

We observe a process X at the discrete and regularly spaced time ti = i/n, where i = 0, . . . , n. 
We assume the existence of a segmentation r = (Tfe)fc = o ... k+1i with = tq < t\ < ■ ■ ■ < tk < 
Tk+i — Ti, such that the restriction of the process X on each interval (jk, Tjfc+i) for k = 0, . . . , K 
is a fBm with Hurst index and scale parameter crj.. The integer K corresponds to the number 
of change points and {K + 1) to the number of segments. Stress that K can be zero and in this 
case the process A is a fBm. 

Let us precise that in our roadmap the process X should be almost surely with continuous 
paths. For this reason, the so-called piecewise fBm can not be defined by plugging a piecewise 
Hurst index into one of the representations of the fBm. Indeed, by doing so, the process X 
would almost surely have discontinuity at at each change point on Hurst index, and method for 
detecting change on the mean will also be efficient for detecting change on Hurst index. Let us 
refer to Taqqu and Samorodinitsky (1994) as a reference book on the different representations of 
fBm and to Ayache and Taqqu (2005) for the construction of multi-fBm by plugging a continuous 
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time varying Hurst index into one of the fBm representations. A rather complicated solution to 
avoid the drawback of pathwise discontinuities due to Hurst index discontinuities, was proposed 
in Benassi et al. (2000) and cosigned by the first author. However, as point out by Antoine 
Ayache during private conversations held in 2004, for statistical applications, it suffices to cancel 
the discontinuity by adding a correction term. The same solution is also adopted in Bardet and 
Kammoun (2008) . 

The model having being specified, we are concerned with change point analysis on the Hurst 
parameter, where the number of change K is unknown. There are few references on this prob- 
lem. To our best knowledge, the only reference are Benassi et al. (2000) and Bardet and Kam- 
moun (2008). 

2 Increment Bernoulli Statistic for fBm 

In this section, we investigate the properties of a new estimator of the Hurst parameter of 
fBm, namely the Increment Bernoulli Statistic (IBS). IBS is a variation on IRS which has been 
introduced by Surgailis et al. (2008) and applied to fBm by Bardet and Surgailis (2010). Both 
IRS and IBS are fast and robust estimator of the Hurst index. By fast we mean estimator with 
linear time complexity, and by robust we mean estimator with invariant scaling property. The 
choice of the IBS instead of the IRS is motivated by the fact that the IBS is bit less expensive, 
in terms of time complexity, than the IRS. 

In the next section, the IBS is used as the underlying estimator for the FDpV method, see 
(p}. For this reason, we define IBS for a every process X, even if we apply it to fBm in this 
section. Let X be a process observed at a family of discrete times t k , we define the second order 
increments by 



Then, the Increment Bernoulli Statistic (IBS) is based on the comparison of the signs of consecu- 
tive second order increments. The results of these comparisons will be equal to 1 if the consecutive 
second order increments have the same sign, and otherwise. Hence, this explains the name of 
our new estimator, that is to say: Increment Bernoulli Statistic (IBS) which is given by 



where •) is described as follows if)(x, y) = if sign(x) = sign(y) and otherwise, where sign(z) 
denotes the sign of z. Let us remark that IBS is scale invariant: Indeed, since ip(ax, ay) = ?/>(x, y) 
for a > 0, then lBS n (aX) = IBS n (X). 

When A is a fBm, that is X = Bh with a Hurst index H £ (0, 1), then the IBS converges in 
distribution to a continuous monotonic increasing function A(H) defined as follows 



AX(t k ) := X{t k+2 ) - 2X{t k+1 ) + X{t k ). 




n— 3 



k=0 



A(H) 
Mr) 




P{H) 



3 2H + 2 2H+2 _ 7 )(g_ 2 2^ +1 



3 



where p{H) G (— 1, 1) represents the correlation between two successive second order increments. 
The graph of A(H) is given in Figure [TJ Then, due to the fact that A(-) is a reversible function, 
it is easy to deduce an estimator of the Hurst parameter H given by H n = A _1 (IBS rt (i?//)). 
Furthermore, we note that the function (/>(•, •) = •) — A(H) is a Hermite function with rank 
equal to 2. Then, by applying the Breuer-Major theorem, see e.g. Arcones (1994) [Theorem 4, 
p. 2256] or Nourdin et al. (2010) [Theorem 1, p. 2], we can deduce the following Central Limit 
Theorem (CLT): 

y/H(IBS n (B H )-A{H)) %N(0,a 2 (H)), 

where the sign —)■ means convergence in distribution and the asymptotic variance o~ 2 {H) is 
given by 

a\H) = Y, cov (V (AB H (to), AB H {h)) , V (AB H (tj), AB H (t j+1 ))) . 

jez 

The main advantages of the IBS method are primarily its efficiency in terms of time and memory 
complexity, and secondarily its robustness with respect to scaling properties of the fBm. At first, 
we calculate by recurrence the second order increments (AB/f(tfc)) 0<Kn _ 3 . This first step is 
performed in time and memory complexity on O (n). Next, the computing of IBS n (-Bf/) requires 
roughly n tests + n x p a (H) additions + 1 division, where p a {H) = A(H) G (0, 1) corresponds to 
the probability that two consecutive second order increments have the same sign. Then, we apply 
the Newton algorithm to compute the inverse of the function A(-). Moreover, we note that the 
function •) satisfy the scale invariant property, i.e. for all CgI, ip(C.X,C.Y) = ip{X,Y). 
This means that the multiplication of Bjj by any scaling coefficient C does not impact the 
estimation of the Hurst index, since IBS n (-Bf/) = IBS n (C.-B#). Hence, this proves the robustness 
of the IBS method with respect to scaling. 



Figure 1: The graph of A(H). 



3 Filtered Derivative with p- Value method 

In this section, we describe the Filtered Derivative with p- value method (FDp-V). First, we 
define the Filtered Derivative function. Next, we describe the two steps of the FDp-V method: 
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Step 1 is based on Filtered Derivative and select the potential change points, whereas Step 2 
calculate the p- value associated to each potential change point, for disentangling right change 
points and false alarms. 

At first, we note that A(H) is a continuous monotonic increasing function of H, see Figure[TJ 
So, the detecting of change points on the Hurst parameter H is equivalent to detecting change 
points on A(H). Consequently, the estimator IBS n (l?#) of the parameter A(H) is used as the 
underlying estimator for the FDpV method. We reefer to Bertrand et al. (2011) and Bertrand 
and Fhima (2009) for the introduction of FDpV technology and its numerical efficiency. Let us 
stress that the choice of the direct estimator H n = A _1 (IBS n (i?//)) as the underlying estimator 
for the FDpV would be more expensive in term of numerical complexity. 

Filtered Derivative function 

Let X be a piecewise fBm observed at a family of discrete times tj = j/n, for j = 0, . . . , n. The 
Filtered Derivative for IBS is defined as the difference between the estimators of the parameter 
A(H) computed on two sliding windows respectively at the right and at the left of the index k, 
both of size A, that is specified by the following function 

D(k, A) = IBS {X, k, A) - IBS (X, k-A,A) for k £ [A, n - A] (1) 

where 

k+A 

IBS (X, k, A) = A" 1 4>(AX(t k ),AX(t k+1 )) 
j=k+i 

is an estimator of A(H) on the sliding box [k + 1, k + A]. It is easy to see that the Filtered 
Derivative function D is computed by recurrence with linear time and memory complexity. Even- 
tually, this method consists on filtering data by computing the estimators of the parameter A(H) 
before applying a discrete derivation. This construction explains the name given by Benveniste 
and Basseville (1984), the so-called Filtered Derivative method. 

Step 1: Detection of potential change points 

In order to detect the potential change points, we test the null hypothesis (Ho) of no change 
in the Hurst parameter H against the alternative hypothesis indicating the existence of at 
least one change point 

(■Hi) : There is an integer K £ N* and = tq < t\ < ■ ■ ■ < tk < tk+i = n such that 
H i = ■ ■ ■ = H T1 / H T1+ i = ■ ■ ■ = H T2 • • • / H TK+ i = ■ ■ ■ = H TK+1 . 
where Hj = H Tk is the value of the Hurst parameter at tj € [rfe_i/n, T k jn). 

Now, we fix a probability of type I error at level p^, and we determine the corresponding critical 
value C\ given by 

P ( max \D(k,A)\ > Ci\n is true ) = p\. 

\ke[A:n-A] ) 

Of course, such a probability is usually not available, so that we only consider the asymptotic 
distribution of the maximum of \D\. Then, the change points T k is selected as a potential change 
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point if its local maxima satisfy |-D(r/t,^4)| > C\. We remark through the graph of the function 
\D\ that there are not only the "right hats" (surrounded in green in Figure [2]) which gives the 
right change points, but also false alarms (surrounded in black in Figure [2]). Consequently we 
have introduced another step in order to keep just the right change points. 

Step 2: Elimination of false alarms 

The list of potential change points (ti, . . . , T&- max ) obtained at step 1 contains right change points 
but also false alarms. In the second step, a test is carried out to remove the false alarms from 
the list of change points found at step 1. More precisely, for all potential change point r k , we 
test whether the Hurst parameter is the same on the two successive intervals (t%—i /n,T k /n) and 
(r k /n, Tfc+i/n), or not. Formally, for all 1 < k < K max , we apply the following hypothesis testing 

(%o,fc) : H k = H k+1 versus (Hi )k ) : H k / H k +i, 

where H k is the value of H on the segment (T k _i/n,T k /n). By using this second test, we 
calculate new p- values (p±, . . . ,PK mea ) associated respectively to each potential change points 
(ti, . . . ,TK max )- Then, we only keep the change points which have a p- value smaller than a 
critical level denoted p\. By doing so, we obtain a subset (ti, . . . , rgj of the first list which 
represents the estimators of the change points in the Hurst parameter of mBm. 
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Figure 2: Detection of potential change points. Above: Simulated piecewise fBm with five change 
points in the Hurst parameter. Below: Filtered Derivative function jDj. 



Conclusion 

In conclusion, it appears that the combination of the FDpV and the IBS methods provides a fast 
(time) and cheap (memory) algorithm to the detection of change points on the Hurst parameter 
of piecewise fBm. So, this algorithm is adapted to segment random signals with large datasets. 
In future work, we will develop the FDpV + IBS method in order to detect abrupt changes on 
parameters of real data drawn from financial and physiological domains. 
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